DVD: A Model for Event Diversified Versions Discovery

نویسندگان

Liang Kong

Rui Yan

Yijun He

Yan Zhang

Zhenwei Zhang

Li Fu

چکیده

With the development of the techniques of Event Detection and Tracking, it is feasible to gather text information from many sources and structure it into events which are constructed online automatically and updated temporally. There are always diversified versions to describe an event and users usually are eager to know all the versions. With the huge quantity of documents, it is almost impossible for users to read all of them. In this paper, we formally define the problem of event diversified versions discovery. We introduce a novel and principled model (called DVD) for discovering diversified versions for events. Unlike traditional clustering methods, we apply an iterative algorithm on a bipartite graph integrating co-occurrence and semantics to select the popular words and filter them to reduce the tight correlation between documents in a specific event. Hybrid link structures between words are utilized to find the hierarchical relationships. We employ a web communities discovery algorithm to construct virtual-documents which consist of a bag of words indicating one of the diversified versions. Under Rocchio Classification framework, we can classify the documents to diversified versions. With our novel evaluation method, empirical experiments on two real datasets show that DVD is effective and outperforms various related algorithms, including classic K-means and LDA.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

One Film , or Many ? : The Multiple Texts of the Colonial Korean Film Volunteer

Until recently, studies on films from colonial Korea in the Japanese empire had to rely primarily on secondary texts, such as memoirs, journal and newspaper articles, and film reviews. The recent discovery of original film texts from archives in Japan, China, Russia, and elsewhere and their availability on DVD format, prompted an important turning point in the scholarship. However, juxtaposing ...

متن کامل

Considering Uncertainty in Modeling Historical Knowledge

Simplifying and structuring qualitatively complex knowledge, quantifying it in a certain way to make it reusable and easily accessible are all aspects that are not new to historians. Computer science is currently approaching a solution to some of these problems, or at least making it easier to work with historical data. In this paper, we propose a historical knowledge representation model takin...

متن کامل

Mining Event Temporal Boundaries from News Corpora through Evolution Phase Discovery

Currently news flood spreads throughout the web. The techniques of Event Detection and Tracking makes it feasible to gather and structure text information into events which are constructed online automatically and updated temporally. Users are usually eager to browse the whole event evolution. With the huge quantity of documents, it is almost impossible for users to read all of them. In this pa...

متن کامل

Information Discovery and the Long Tail of Motion Picture Content

Recent papers have shown that, in contrast to ―the Long Tail‖ theory, movie sales remain concentrated in a small number of hits. These papers have argued that concentrated sales can be explained, in part, by heterogeneity in quality and increasing returns from social effects. Our research analyzes an additional explanation: how incomplete information may skew sales patterns. We use the movie br...

متن کامل

Designing a model for holding mega sport events with an emphasis on national brand development

The present study seeks a model for holding major sporting events with an emphasis on national brand development. The research method is a mixture of qualitative and quantitative. In the quantitative part, the statistical population, including professors and sports activists, and the statistical sample was done by stratified random sampling. Adequate number for modeling in pls software was 300 ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

DVD: A Model for Event Diversified Versions Discovery

نویسندگان

چکیده

منابع مشابه

One Film , or Many ? : The Multiple Texts of the Colonial Korean Film Volunteer

Considering Uncertainty in Modeling Historical Knowledge

Mining Event Temporal Boundaries from News Corpora through Evolution Phase Discovery

Information Discovery and the Long Tail of Motion Picture Content

Designing a model for holding mega sport events with an emphasis on national brand development

عنوان ژورنال:

اشتراک گذاری